8 research outputs found
Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions
Pruning is one of the predominant approaches for compressing deep neural
networks (DNNs). Lately, coresets (provable data summarizations) were leveraged
for pruning DNNs, adding the advantage of theoretical guarantees on the
trade-off between the compression rate and the approximation error. However,
coresets in this domain were either data-dependent or generated under
restrictive assumptions on both the model's weights and inputs. In real-world
scenarios, such assumptions are rarely satisfied, limiting the applicability of
coresets. To this end, we suggest a novel and robust framework for computing
such coresets under mild assumptions on the model's weights and without any
assumption on the training data. The idea is to compute the importance of each
neuron in each layer with respect to the output of the following layer. This is
achieved by a combination of L\"{o}wner ellipsoid and Caratheodory theorem. Our
method is simultaneously data-independent, applicable to various networks and
datasets (due to the simplified assumptions), and theoretically supported.
Experimental results show that our method outperforms existing coreset based
neural pruning approaches across a wide range of networks and datasets. For
example, our method achieved a compression rate on ResNet50 on ImageNet
with drop in accuracy
No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks
A common technique for compressing a neural network is to compute the k-rank ℓ2 approximation Ak of the matrix A∈Rn×d via SVD that corresponds to a fully connected layer (or embedding layer). Here, d is the number of input neurons in the layer, n is the number in the next one, and Ak is stored in O((n+d)k) memory instead of O(nd). Then, a fine-tuning step is used to improve this initial compression. However, end users may not have the required computation resources, time, or budget to run this fine-tuning stage. Furthermore, the original training set may not be available. In this paper, we provide an algorithm for compressing neural networks using a similar initial compression time (to common techniques) but without the fine-tuning step. The main idea is replacing the k-rank ℓ2 approximation with ℓp, for p∈[1,2], which is known to be less sensitive to outliers but much harder to compute. Our main technical result is a practical and provable approximation algorithm to compute it for any p≥1, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing the networks BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage
Coresets for the Average Case Error for Finite Query Sets
Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for n≫1/ε, where ε is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided